Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

نویسندگان

Nicolas Galichet

Michèle Sebag

Olivier Teytaud

چکیده

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the usersupplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MaRaB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits

متن کامل

Multi-Armed Bandits, Gittins Index, and its Calculation

Multi-armed bandit is a colorful term that refers to the di lemma faced by a gambler playing in a casino with multiple slot machines (which were colloquially called onearmed bandits). W h a t strategy should a gambler use to pick the machine to play next? It is the one for which the posterior mean of winning is the highest and thereby maximizes current expected reward, or the one for which the ...

متن کامل

Risk-Aversion in Multi-armed Bandits

Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off...

متن کامل

Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

Motivated by realtime website optimization, this paper is about online learning in abruptly changing environments. Two extensions of the UCBT algorithm are combined in order to handle dynamic multi-armed bandits, and specifically to cope with fast variations in the rewards. Firstly, a change point detection test based on Page-Hinkley statistics is used to overcome the limitations due to the UCB...

متن کامل

Improving Online Marketing Experiments with Drifting Multi-armed Bandits

Restless bandits model the exploration vs. exploitation trade-off in a changing (non-stationary) world. Restless bandits have been studied in both the context of continuously-changing (drifting) and change-point (sudden) restlessness. In this work, we study specific classes of drifting restless bandits selected for their relevance to modelling an online website optimization process. The contrib...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1401.1123 شماره

صفحات -

تاریخ انتشار 2014

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

نویسندگان

چکیده

منابع مشابه

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits

Multi-Armed Bandits, Gittins Index, and its Calculation

Risk-Aversion in Multi-armed Bandits

Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

Improving Online Marketing Experiments with Drifting Multi-armed Bandits

عنوان ژورنال:

اشتراک گذاری